Add proposal for per-tenant cardinality API#7335

Open

CharlieTLe wants to merge 6 commits intocortexproject:masterfrom

CharlieTLe:proposal/per-tenant-tsdb-status-api

Member

CharlieTLe commented Mar 7, 2026 •

edited

Loading

Summary

Proposal for a per-tenant cardinality API (GET /api/v1/cardinality) that exposes cardinality statistics (top metrics by series count, top labels by value count, top label-value pairs by series count) across two data sources:

source=head: Fans out to ingesters via the distributor, aggregates TSDB head stats with RF-based deduplication.
source=blocks: Fans out to store gateways via BlocksFinder + GetClientsFor, computes cardinality from block indexes with per-block caching.

Key design points:

start/end required for blocks path, rejected for head path (head cannot sub-filter)
Per-tenant limits: cardinality_api_enabled, cardinality_max_query_range, cardinality_max_concurrent_requests, cardinality_query_timeout
Standard {status, data} Prometheus response envelope with approximated field for block overlap / partial results
Phased rollout: head path first, blocks path second, behind per-tenant feature flag

🤖 Generated with Claude Code


          Add proposal for per-tenant TSDB status API

bd518d8

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Charlie Le <charlie_le@apple.com>

pull-request-size bot added the size/L label

dosubot bot added the type/feature label

CharlieTLe mentioned this pull request

Add per-tenant TSDB cardinality status API endpoint #7332

Open

4 tasks

yeya24 reviewed

View reviewed changes

docs/proposals/per-tenant-tsdb-status-api.md Outdated


		Currently, Cortex tenants lack visibility into which metrics, labels, and label-value pairs contribute the most series in ingesters. Without this information, debugging high-cardinality issues requires operators to inspect TSDB internals directly on ingester instances, which is impractical in a multi-tenant, distributed environment.

		Prometheus itself exposes a `/api/v1/status/tsdb` endpoint that provides cardinality statistics from the TSDB head. This proposal brings equivalent functionality to Cortex as a multi-tenant, distributed API.

Contributor

yeya24 Mar 9, 2026

I am not a fan of TSDB status API name... Prometheus API might change and add more stuff. A dedicated api/v1/cardinality might be better?

Member

friedrichg Mar 13, 2026

I agree. We might not use the prometheus TSDB in the future.

docs/proposals/per-tenant-tsdb-status-api.md Outdated


		## Out of Scope

		- Long-term storage cardinality analysis: This endpoint only covers in-memory TSDB head data in ingesters. Analyzing cardinality across compacted blocks in object storage is a separate concern. A future long-term cardinality API could reuse portable fields (see [Extensibility](#extensibility-to-long-term-storage)) or introduce a separate endpoint.

Contributor

yeya24 Mar 9, 2026

Do we plan to have a different API for long term storage cardinality? We should aim for the same API endpoint even though we don't have to design for it now

Member

friedrichg Mar 13, 2026

I agree we should plan for this too. Probably sooner than later

docs/proposals/per-tenant-tsdb-status-api.md Outdated


		Expose per-tenant TSDB head cardinality statistics via a REST API endpoint on the Cortex query path. The endpoint should:

		1. Be compatible with the Prometheus `/api/v1/status/tsdb` response format.

Contributor

yeya24 Mar 9, 2026

I am not sure if this needs to be as part of the goal. Does it need to be compatible.
I think our API response format is already incompatible today

docs/proposals/per-tenant-tsdb-status-api.md Outdated

+              ```
+              - **Authentication**: Requires `X-Scope-OrgID` header (standard Cortex tenant authentication).
+              - **Query Parameter**: `limit` (optional, default 10) - controls the number of top items returned per category.

Contributor

yeya24 Mar 9, 2026

How about start and end?

Member

friedrichg Mar 13, 2026

I agree, we need start and end. Sometimes cardinality issues are specific in time

docs/proposals/per-tenant-tsdb-status-api.md Outdated

+              message TSDBStatusResponse {
+                uint64 num_series = 1;
+                int64 min_time = 2;
+                int64 max_time = 3;

Contributor

yeya24 Mar 9, 2026

Do we need min max? How do we aggregate this in the final response? min(min_t) and max(max_t)?

docs/proposals/per-tenant-tsdb-status-api.md Outdated


		2. `chunkCount` omitted: Prometheus includes a `chunkCount` field (from `prometheus_tsdb_head_chunks`). In a distributed system with replication, chunk counts across ingesters cannot be meaningfully aggregated — chunks are an ingester-local storage detail, and summing/dividing by the replication factor does not produce a useful number.

		Open question: Should we adopt the `headStats` wrapper to maintain client compatibility with Prometheus tooling? The trade-off is compatibility vs simplicity — the flat format is easier to consume for Cortex-specific clients, but adopting the Prometheus format would allow reuse of existing client libraries.

Contributor

yeya24 Mar 9, 2026

Any Prometheus tool consumes this today? Why compatibility is a concern

docs/proposals/per-tenant-tsdb-status-api.md Outdated

+              | `labelValueCountByLabelName` | No | Portable to block storage |
+              | `seriesCountByLabelValuePair` | No | Portable to block storage |
+              | `memoryInBytesByLabelName` | **Yes** | In-memory byte usage has no analogue in object storage |
+              | `minTime` / `maxTime` | **Yes** | Reflects head time range, not total storage |

Contributor

yeya24 Mar 9, 2026

Do we need to add those head specific fields?

CharlieTLe and others added 5 commits

March 13, 2026 12:38


          Extend TSDB status proposal with long-term storage cardinality via st…

d8388cc

…ore gateways

Add source=blocks query parameter to analyze cardinality from compacted
blocks in object storage. The blocks path fans out to store gateways,
which compute statistics from block index headers (cheap label value
counts) and posting list expansion (exact series counts per metric).
Results are cached per immutable block.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Charlie Le <charlie_le@apple.com>


          Update proposal based on PR review: rename to Cardinality API and sim…

e9782c4

…plify

Address feedback from PR cortexproject#7335 review:
- Rename endpoint from /api/v1/status/tsdb to /api/v1/cardinality
- Drop Prometheus compatibility as a goal
- Add start/end time range query parameters
- Drop head-specific fields (numLabelPairs, memoryInBytesByLabelName,
  minTime, maxTime) to unify response across both sources
- Remove API Compatibility and Field Portability sections

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Charlie Le <charlie_le@apple.com>


          Require start/end for blocks path and add per-tenant max query range …

87fe63e

…limit

Make start/end required for source=blocks to prevent unbounded block
scanning. Add cardinality_max_query_range per-tenant limit (default 24h)
to give operators control over the blast radius.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Charlie Le <charlie_le@apple.com>


          Address all review findings from proposal review

c4bac7f

Critical:
- Fix blocks path aggregation: no SG RF division since GetClientsFor
  routes each block to exactly one store gateway

Significant:
- Add min_time, max_time, block_ids to store gateway CardinalityRequest
- Specify MaxErrors=0 for head path with availability implications
- Add consistency check and retry logic for blocks path
- Document RF division as best-effort approximation

Moderate:
- Wrap responses in standard {status, data} Prometheus envelope
- Change HTTP 422 to HTTP 400 for limit violations
- Add Error Responses section with all validation scenarios
- Add approximated field for block overlap and partial results
- Add Observability section with metrics
- Add per-tenant concurrency limit and query timeout
- Reject start/end for source=head instead of silently ignoring

Low:
- Add Rollout Plan with phased approach and feature flag
- Document rolling upgrade compatibility (Unimplemented handling)
- Document Query Frontend bypass
- Improve caching: full results keyed by ULID, limit at response time
- Add missing files to implementation section
- Move shared proto to pkg/cortexpb/cardinality.proto
- Rename TSDBStatus* to Cardinality* throughout
- Add limit upper bound (max 512)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Charlie Le <charlie_le@apple.com>


          Rename proposal file to per-tenant-cardinality-api.md

8199b43

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Charlie Le <charlie_le@apple.com>

CharlieTLe changed the title ~~Add proposal for per-tenant TSDB status API~~ Add proposal for per-tenant cardinality API

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L type/feature